Sample-Efficient Learning of Mixtures

نویسندگان

  • Hassan Ashtiani
  • Shai Ben-David
  • Abbas Mehrabian
چکیده

We consider PAC learning of probability distributions (a.k.a. density estimation), where we are given an i.i.d. sample generated from an unknown target distribution, and want to output a distribution that is close to the target in total variation distance. Let F be an arbitrary class of probability distributions, and let F denote the class of k-mixtures of elements of F . Assuming the existence of a method for learning F with sample complexity mF (ǫ) in the realizable setting, we provide a method for learningF with sample complexity O(k log k ·mF(ε)/ǫ2) in the agnostic setting. Our mixture learning algorithm has the property that, if the F -learner is proper, then the F-learner is proper as well. We provide two applications of our main result. First, we show that the class of mixtures of k axis-aligned Gaussians in R is PAC-learnable in the agnostic setting with sample complexity Õ(kd/ǫ), which is tight in k and d. Second, we show that the class of mixtures of k Gaussians in R is PAC-learnable in the agnostic setting with sample complexity Õ(kd/ǫ), which improves the previous known bounds of Õ(kd/ǫ) and Õ(kd/ǫ) in its dependence on k and d.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation

While several papers have investigated computationally and statistically efficient methods for learning Gaussian mixtures, precise minimax bounds for their statistical performance as well as fundamental limits in high-dimensional settings are not well-understood. In this paper, we provide precise information theoretic bounds on the clustering accuracy and sample complexity of learning a mixture...

متن کامل

Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures

Statistical and machine-learning algorithms are frequently applied to high-dimensional data. In many of these applications data is scarce, and often much more costly than computation time. We provide the first sample-efficient polynomial-time estimator for high-dimensional spherical Gaussian mixtures. For mixtures of any k d-dimensional spherical Gaussians, we derive an intuitive spectral-estim...

متن کامل

Agnostic Distribution Learning via Compression

We prove that Θ̃(kd2/ε2) samples are necessary and sufficient for learning a mixture of k Gaussians in Rd, up to error ε in total variation distance. This improves both the known upper bound and lower bound for this problem. For mixtures of axis-aligned Gaussians, we show that Õ(kd/ε2) samples suffice, matching a known lower bound. Moreover, these results hold in an agnostic learning setting as ...

متن کامل

Learning Mixtures of Discrete Product Distributions using Spectral Decompositions

We study the problem of learning a distribution from samples, when the underlying distribution is a mixture of product distributions over discrete domains. This problem is motivated by several practical applications such as crowdsourcing, recommendation systems, and learning Boolean functions. The existing solutions either heavily rely on the fact that the number of mixtures is finite or have s...

متن کامل

Learning High-Dimensional Mixtures of Graphical Models

We consider unsupervised estimation of mixtures of discrete graphical models, where the class variable corresponding to the mixture components is hidden and each mixture component over the observed variables can have a potentially different Markov graph structure and parameters. We propose a novel approach for estimating the mixture components, and our output is a tree-mixture model which serve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1706.01596  شماره 

صفحات  -

تاریخ انتشار 2017